principal component score
Noisy Data Visualization using Functional Data Analysis
Chen, Haozhe, Correa, Andres Felipe Duque, Wolf, Guy, Moon, Kevin R.
Data visualization via dimensionality reduction is an important tool in exploratory data analysis. However, when the data are noisy, many existing methods fail to capture the underlying structure of the data. The method called Empirical Intrinsic Geometry (EIG) was previously proposed for performing dimensionality reduction on high dimensional dynamical processes while theoretically eliminating all noise. However, implementing EIG in practice requires the construction of high-dimensional histograms, which suffer from the curse of dimensionality. Here we propose a new data visualization method called Functional Information Geometry (FIG) for dynamical processes that adapts the EIG framework while using approaches from functional data analysis to mitigate the curse of dimensionality. We experimentally demonstrate that the resulting method outperforms a variant of EIG designed for visualization in terms of capturing the true structure, hyperparameter robustness, and computational speed. We then use our method to visualize EEG brain measurements of sleep activity.
Joint machine learning analysis of muon spectroscopy data from different materials
Tula, T., Mรถller, G., Quintanilla, J., Giblin, S. R., Hillier, A. D., McCabe, E. E., Ramos, S., Barker, D. S., Gibson, S.
Machine learning (ML) methods have proved to be a very successful tool in physical sciences, especially when applied to experimental data analysis. Artificial intelligence is particularly good at recognizing patterns in high dimensional data, where it usually outperforms humans. Here we applied a simple ML tool called principal component analysis (PCA) to study data from muon spectroscopy. The measured quantity from this experiment is an asymmetry function, which holds the information about the average intrinsic magnetic field of the sample. A change in the asymmetry function might indicate a phase transition; however, these changes can be very subtle, and existing methods of analyzing the data require knowledge about the specific physics of the material. PCA is an unsupervised ML tool, which means that no assumption about the input data is required, yet we found that it still can be successfully applied to asymmetry curves, and the indications of phase transitions can be recovered. The method was applied to a range of magnetic materials with different underlying physics. We discovered that performing PCA on all those materials simultaneously can have a positive effect on the clarity of phase transition indicators and can also improve the detection of the most important variations of asymmetry functions. For this joint PCA we introduce a simple way to track the contributions from different materials for a more meaningful analysis.
Multipopulation mortality modelling and forecasting: The multivariate functional principal component with time weightings approaches
Human mortality patterns and trajectories in closely related populations are likely linked together and share similarities. It is always desirable to model them simultaneously while taking their heterogeneity into account. This paper introduces two new models for joint mortality modelling and forecasting multiple subpopulations in adaptations of the multivariate functional principal component analysis techniques. The first model extends the independent functional data model to a multi-population modelling setting. In the second one, we propose a novel multivariate functional principal component method for coherent modelling. Its design primarily fulfils the idea that when several subpopulation groups have similar socio-economic conditions or common biological characteristics, such close connections are expected to evolve in a non-diverging fashion. We demonstrate the proposed methods by using sex-specific mortality data. Their forecast performances are further compared with several existing models, including the independent functional data model and the Product-Ratio model, through comparisons with mortality data of ten developed countries. Our experiment results show that the first proposed model maintains a comparable forecast ability with the existing methods. In contrast, the second proposed model outperforms the first model as well as the current models in terms of forecast accuracy, in addition to several desirable properties.
Dimensionality Reduction for Binary Data through the Projection of Natural Parameters
Landgraf, Andrew J., Lee, Yoonkyung
Principal component analysis (PCA) for binary data, known as logistic PCA, has become a popular alternative to dimensionality reduction of binary data. It is motivated as an extension of ordinary PCA by means of a matrix factorization, akin to the singular value decomposition, that maximizes the Bernoulli log-likelihood. We propose a new formulation of logistic PCA which extends Pearson's formulation of a low dimensional data representation with minimum error to binary data. Our formulation does not require a matrix factorization, as previous methods do, but instead looks for projections of the natural parameters from the saturated model. Due to this difference, the number of parameters does not grow with the number of observations and the principal component scores on new data can be computed with simple matrix multiplication. We derive explicit solutions for data matrices of special structure and provide computationally efficient algorithms for solving for the principal component loadings. Through simulation experiments and an analysis of medical diagnoses data, we compare our formulation of logistic PCA to the previous formulation as well as ordinary PCA to demonstrate its benefits.
The Mahalanobis distance for functional data with applications to classification
Joseph, Esdras, Galeano, Pedro, Lillo, Rosa E.
This paper presents a general notion of Mahalanobis distance for functional data that extends the classical multivariate concept to situations where the observed data are points belonging to curves generated by a stochastic process. More precisely, a new semi-distance for functional observations that generalize the usual Mahalanobis distance for multivariate datasets is introduced. For that, the development uses a regularized square root inverse operator in Hilbert spaces. Some of the main characteristics of the functional Mahalanobis semi-distance are shown. Afterwards, new versions of several well known functional classification procedures are developed using the Mahalanobis distance for functional data as a measure of proximity between functional observations. The performance of several well known functional classification procedures are compared with those methods used in conjunction with the Mahalanobis distance for functional data, with positive results, through a Monte Carlo study and the analysis of two real data examples.